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1.  INTRODUCTION 

1.1  MOTIVATION 

A hybrid  aircraft  navigation  system  processes  data 
from  a variety  of  sensors  to  produce  estimates  of  the  air- 
craft's attitude,  position,  and  velocity.  These  estimates 
of  the  aircraft's  state  are  displayed  to  the  pilot  and  used 
by  the  aircraft's  automatic  control  systems.  A typical  hy- 
brid navigation  system  is  illustrated  in  Fig.  1-1. 
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Figure  1-1  Typical  Hybrid  Navigation  System  Structure 

Theoretically,  the  most  accurate  estimates  of  the 
system  states  would  be  produced  by  a single  data  processor 
operating  on  all  the  raw  sensor  data  simultaneously.  How- 
ever, such  a centralized  estimation  architecture  requires 
communication  of  large  quantities  of  information  to  a central 
computer.  Furthermore,  an  extremely  fast  central  computer  is 
needed  to  process  all  of  the  raw  data  at  sensor  data  rates. 
Since  the  cost  of  a communications  channel  increases  with 
bandwidth  and  the  cost  of  a computer  increases  with  speed, 
a centralized  estimation  system  can  become  quite  costly. 

Also  a centralized  estimation  architecture  is  not  conducive 
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to  a modular  system  design  since  e.  change  in  one  of  the  sensors 
can  impact  the  mechanical  and  electical  design  of  the  entire 
system . 

Fortunately,  an  attractive  alternative  to  the  cen- 
tralized estimation  architecture  is  now  available.  Recent 
advances  in  large  scale  integration  (LSI)  electronics  have 
made  data  processing  units  available  which  are  physically 
small,  lightweight,  cheap,  and  low  in  power  consumption. 

Thus  it  is  feasible  to  consider  modular  estimation  architec- 
tures which  perform  relatively  sophisticated  data  preproces- 
sing within  each  sensor  package,  as  illustrated  in  Fig.  1-2. 

In  this  modular  estimation  architecture,  data  preprocessors 
are  operated  at  sensor  data  rates  to  remove  redundancy  in 
the  sensor  data.  The  resulting  compressed  information  is 
transmitted  to  a data  coordinator,  which  supplies  estimates 
at  a slower  rate  to  the  displays  and  controls.  This  redund- 
ancy reduction  allows  a relaxation  of  the  bandwidth  require- 
ments for  communication  channels  which  transmit  data  from 
the  sensors.  Furthermore,  since  computations  are  being  per- 
formed in  parallel  in  this  system,  the  speed  and  capacity 
of  any  one  computer  in  the  system  can  be  much  less  than  that 
required  for  the  computer  in  the  centralized  estimation 
architecture . 
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Figure  1-2  Modular  Estimation  Architecture 
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A major  dividend  of  the  modular  architecture  results 
because  electrical  and  mechanical  interf aces  between  the  data 
coordinator  and  certain  functional  types  of  sensor /preproces- 
sor package  can  be  standardized.  Then  any  velocity  reference 
can  be  replaced  by  another  velocity  reference  without  affect- 
ing the  electrical  design  of  the  system.  Also,  the  data  pre- 
processors can  be  designed  so  that  a failure  of  any  one 
component,  even  part  of  the  data  coordinator,  will  still 
leave  the  system  operational.  This  increases  the  system  re- 
liability . 

Actually,  state-of-the-art  navigation  systems  already 
use  a form  of  modular  estimation.  A typical  state-of-the-art 
navigation  system,  diagrammed  in  Fig.  1-3,  preprocesses  ac- 
celerometer and  gyro  measurements  by  implementing  a set  of 
equations,  called  mechanization  equations,  to  produce  posi- 
tion, velocity,  and  attitude  estimates.  Signals  from  radio 
navigation  aids  are  also  preprocessed  by  signal  detection 
and  demodulation  techniques,  which  serve  to  compress  the 
data  so  that  information  not  required  for  navigation  is  re- 
moved and  relevant  information  is  retained.  Although  these 
preprocessing  equations  are  generally  nonlinear,  the  errors 


Figure  1-3  Typical  St ate-of-the-Art  Navigation  System 
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in  the  resulting  compressed  data  can  usually  be  modeled  as 
the  outputs  of  a linear  system  driven  by  Gaussian  noise. 
Therefore,  a linear  filter  can  be  used  to  coordinate  the 
compressed  sensor  data  and  determine  the  minimum-variance 
estimates  of  the  navigation  system  errors.  These  error  esti- 
mates are  then  used  to  correct  the  inertial  system  outputs, 
and  the  corrected  output  may  be  used  to  reset  the  mechaniza- 
tion equations. 


This  report  presents  original  work  relating  to  the 


design  of  modular  estimators.  The  mathematical  foundations 


of  a systematic  procedure  for  designing  modular  estimators 


are  established  and  the  procedure  is  applied  to  a simple 


example . 


DESIGN  CONSIDERATIONS 


In  designing  a modular  estimator,  the  following  fac- 
tors must  be  balanced: 


estimation  accuracy 

communication  channel  bandwidths 

data  preprocessor  and  data  coor- 
dinator complexity 


Estimation  accuracy  is  usually  measured  in  terms  of 
the  mean-square  estimation  error  for  some  portion  of  the  sys- 
tem state.  In  information  theory  terminology,  this  is  re- 
ferred to  as  a quadratic  distortion  measure.  A quadratic  dis- 
tortion measure  will  be  assumed  almost  exclusively  in  this 
report . 


called  its  channel  capacity  or  information  transfer  rate  and 
is  usually  measured  in  terms  of  the  number  of  bits  per  second 
it  is  capable  of  handling.  The  cost  of  a communications 
channel  increases  as  its  capacity  is  increased;  therefore,  it 
is  desirable  to  keep  the  required  channel  capacity  small.  On 
the  other  hand,  transmitting  the  sensor  data  over  a limited 
capacity  channel  introduces  distortion  (e.g.,  measurement 
quantization  increases  mean-square  estimation-error).  The 
distortion  introduced  depends  on  the  coding  used  to  convert 
the  data  to  a form  which  can  be  transmitted  over  the  channel. 

A particular  coding  scheme  applied  to  data  from  a particular 
source  will  achieve  a specific  information  transfer  rate  (bits 
transferred  per  second)  and  distortion  (e.g.  mean-square 
estimation-error)  corresponding  to  a point  in  the  rate- 
distortion  plane.  As  an  example,  consider  the  problem  of 
sending  the  values  of  Gaussian  random  variables  over  a digital 
channel.  Suppose  one  value  must  be  sent  every  second.  A 
straight-forward  way  of  coding  the  value  is  by  quantization. 

A wide  variety  of  quantization  methods,  which  differ  in  both 
the  number  and  location  of  quantization  steps,  can  be  used. 

The  number  of  quantization  steps  determines  the  number  of 
bits  sent  every  second,  and  the  location  of  the  steps  deter- 
mines the  mean-square  error  in  the  transmitted  value. 

For  a particular  data  source,  the  relationship  be- 
tween the  channel  capacity  used  and  the  minimum  distortion 
which  can  be  achieved  is  given  by  the  rate-distortion  curve 
(also  called  the  rate-distortion  function)  as  shown  in 
Fig.  1.2-1.  The  rate-distortion  curve  is  the  lower  left 
boundary  of  the  region  of  points  corresponding  to  various 
coding  systems.  Both  the  required  channel  capacity  and  the 
distortion  can  be  minimized  by  chosing  a coding  scheme  which 
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gives  performance  near  this  carve.  Analytical  methods,  dis- 
cussed in  Appendix  A,  can  be  used  to  determine  the  rate-dis- 
tortion curve,  but  in  general  the  coding  scheme  that  will 
give  performance  on  this  curve  when  combined  with  a specific 
channel  cannot  be  determined  analytically. 


Figure  1.2-1  Typical  Rate-Distortion  Curve 
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Performance  near  the  rate-distortion  curve  can  be 
achieved  by  using  a data  preprocessor  to  remove  irrelevant 
and  redundant  information  from  the  sensor  data.  The  com- 
plexity of  the  preprocessor  must  be  held  to  a minimum  to 
minimize  costs.  Similarly,  the  data  coordinator,  which 
combines  data  transmitted  from  the  data  preprocessors,  must 
be  kept  as  simple  as  possible  to  keep  computer  costs  down. 
Unfortunately,  simplifying  the  data  preprocessor  increases 
the  required  channel  capacity  and  simplifying  the  data 
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coordinator  increases  the  distortion.  So  a design  compromise 
must  be  found. 

The  methods  discussed  in  this  report  provide  a sys- 
tematic procedure  for  striking  a balance  among  the  conflicting 
goals  of  minimizing  channel  capacity  requirements,  minimizing 
distortion,  and  minimizing  estimator  complexity.  The  next 
section  presents  an  overview  of  this  procedure. 


1.3  OVERVIEW  OF  DESIGN  PROCEDURE 

The  modular  estimator  design  technique  presented 
hei  e is  composed  of  two  major  steps:  preprocessor  design  and 

data  coordinator  design.  Preprocessor  design  studies  deter- 
mine a feasible  combination  of  prefilter  structure,  times  of 
transmission,  channel  capacity  and  coding  scheme.  Data 
coordinator  design  studies  are  used  to  develop  a practical 
data  coordinator  that  will  efficiently  use  all  available  data. 

Data  preprocessors  are  designed  using  the  procedure 
described  in  Chapter  2.  The  minimum-variance  reduced-order 
estimator  design  techniques  of  Ref.  7 are  used  to  study  var- 
ious prefilter  structures  to  determine  the  distortion  limit 
that  can  be  achieved  with  a very  high  capacity  channel.  Next 
the  techniques  developed  in  Appendix  A are  used  to  compute  the 
rate-distortion  curve  for  a selection  of  candidate  prefilters. 
The  times  of  transmitting  data  from  the  data  preprocessor  to 
the  data  coordinator  also  affect  the  rate— distortion  curves; 
so,  candidate  prefilters  are  studied  with  a variety  of  trans- 
mission times.  As  a result  of  these  studies,  a feasible  com- 
bination of  prefilter,  transmission  times,  and  channel  capacity 
is  determined.  A coding  scheme  that  approximates  the  optimal 
coding  scheme  is  also  determined  at  this  time. 
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Once  feasible  preprocessor  designs  are  established, 
the  data  coordinator  is  designed  as  described  in  Chapter  3. 
Preprocessor  dynamics  as  well  as  system  dynamics  are  con- 
sidered in  this  design,  but  the  data  coordinator  is  usually 
designed  with  fewer  than  the  number  of  states  required  to 
achieve  optimal  performance.  The  minimum-variance  reduced- 
order  estimator  design  procedures  of  Ref.  7 are  used  effec- 
tively in  data  coordinator  design.  The  sensitivity  of  a 
data  Coordinator  design  to  uncertainties  in  system  parameters 
is  studies  by  applying  the  methods  discussed  in  Chapter  4 
and  Appendix  B. 
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DATA  PREPROCESSOR  DESIGN 


PREPROCESSOR  STRUCTURES 


A aata  preprocessor  is  a device  or  algorithm  which 
transforms  sensor  data  into  an  easily  used  form,  removes  ir- 
relevant and  redundant  information,  and  encodes  the  data  for 
transmission  to  the  user.  Therefore,  preprocessor  operatior 
can  be  divided  into  three  basic  functions  (See  Fig.  2.1-1): 


Data  Transformation  - converting  raw  sen- 
sor data  to  system  state  measurements, 
e.g.,  a doppler  receiver  converts  electro- 
magnetic signals  to  a velocity  measurement 
Usually  this  is  a nonlinear  operation. 

Prefiltering  (Data  Compression)  - reducing 
the  volume  of  data  which  must  be  communi- 
cated to  the  data  coordinator,  e.g.,  dop- 
pler velocity  data  may  be  averaged  before 
being  sent  to  the  data  coordinator.  Usu- 
ally this  is  a linear  operation. 

Encoding  - coding  the  prefiltered  data  in 
a form  suitable  for  transmission  over  the 
channel.  For  example,  velocity  data  may 
be  quantized  for  transmission  over  a digi- 
tal channel.  Usually  this  is  a nonlinear 
operat ion . 


For  convenience,  the  data  transformation  will  be  ignored  here 
Equivalently,  the  data  transformation  can  be  thought  of  as 
part  of  the  sensor. 
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Figure  2.1  1 Typical  Data  Preprocessor  Structure 


....  * ....  usi 


c 


MWWWi 


i 

I 


Although  nonlinear  operations  are  sometimes  required 
to  tiansform  raw  sensor  data  to  system  state  measurements, 
linear  operations  are  usually  adequate  for  prefiltering. 

For  example,  simple  averaging  of  polynomial  fits  to  the  data 
are  often  used  for  prefiltering  (Refs.  1 through  3).  There- 
fore, only  linear  prefiltering  techniques  will  be  considered 
here.  Furthermore  only  discrete-time  prefiltering  which 
can  be  performed  with  digital  hardware  will  be  considered. 

To  simplify  the  system  design,  the  following  ground- 
rules  will  be  used: 

• Prefilters  will  be  designed  to 
operate  independently  of  other 
information  sources. 

• The  information  transfer  rate  will 
be  assumed  to  be  fixed  at  a speci- 
fied number  cf  bits  per  transmis- 
sion . 

• A quadratic  distortion  measure  will 
be  used  to  measure  the  performance 
of  the  prefilter. 

The  first  groundrule  limits  the  number  of  interconnections 
between  system  elements  to  the  form  shown  in  Fig.  1-2.  Fur- 
thermore, it  leads  to  a design  which  can  give  acceptable 
performance  when  some  of  the  sensors  fail.  The  other  ground- 
rules  reflect  common  engineering  practice  and  do  not  greatly 
restrict  the  design. 


I 
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2.2  OPTIMUM  PREPROCESSOR  DESIGN 

Suppose  that  the  signal  process  produced  by  the 
sensor  (possibly  after  a nonlinear  data  transformation)  can 
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bo  modeled  as  the  output  of  a linear  system  driven  by  Gaus- 
sian noise  and  that  the  distortion  measure  is  quadratic.  Then 
results  from  Refs.  4 and  5 and  from  Appendix  A show  that  the 
optimum  preprocessor  takes  the  form  shown  in  Fig.  2.2-1.  The 
raw  sensor  data  is  fed  into  the  Kalman  filter  for  this  signal 
source,  and  the  state  of  this  Kalman  filter  is  encoded  for 


transmission  to  the  data  coordinator. 
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Figure  2.2-1 


Optimum  Preprocessor  for 
Gaussian  Signals 


Since  the  state  of  the  Kalman  filter  is  a continuous 


random  variable,  an  infinite  number  of  bits  would  be  required 
to  transmit  its  exact  value.  Because  the  encoder,  channel, 
and  decoder  can  handle  only  a finite  number  of  bits  per  sec- 
ond, distortion  will  be  introduced  in  transmitting  the  state 
of  the  prefilter  to  the  data  coordinator.  The  minimum  distor- 
tion which  can  be  achieved  with  a channel  of  specified  capac- 
ity is  given  by  the  rate-distortion  curve  (see  Appendix  A). 
Also,  the  rate-distortion  curve  determines  the  minimum  chan- 
nel capacity  needed  to  achieve  a specified  distortion.  Thus, 
the  rate-distortion  curve  permits  the  separation  of  prefilter 
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Figure  2.2-2  Typical  Rate-Distortion  Curve 


The  rate-distortion  curve  specifies  the  channel 
capacity  needed  to  achieve  a desired  level  of  distortion 
provided  that  an  optimum  prefilter  (Kalman  filter)  and  opti- 
mum encoder-channe 1— decoder  combination  are  used.  During 
the  calculation  of  the  rate— distortion  function,  the  opti- 
mum combined  effect  of  the  encoder,  channel,  and  decoder 
is  determined.  The  optimum  combination  has  the  appearance 
of  an  encoder-compression  matrix  multiplying  the  filter 
state,  an  additive  white  Gaussian  noise,  introduced  by  the 
channel,  and  another  Kalman  filter  used  as  a decoder  (see 
Fig.  2.2-3).  The  rate-distortion  calculations  in  Appen- 
dix A determine  the  optimum  encoder  compression  matrix  to 
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Figure  2.2-3  Optimum  Encoder-Channel- 

Decoder  Combination 

use  with  a given  channel  noise.  In  effect,  this  determines 
-he  optimum  signal-to-noise  ratio  and  the  most  important  part 
of  the  system  state  to  transmit.  ~~ 

In  most  situations,  the  channel  cannot  be  chosen  to 
be  simply  an  additive  Gaussian  noise  with  a specified  co- 
variance  matrix.  For  example,  digital  communications  chan- 
nels produce  quantization  errors  which  are  neither  additive 
nor  Gaussian.  In  theory,  an  encoder  and  a corresponding  de- 
coder exist,  which  when  combined  with  the  digital  channel, 
would  be  equivalent  to  the  optimum  combination  shown  in 
Fig.  2.2-3.  Unfortunately,  the  structure  of  the  optimum 
encoder  and  decoder  for  a digital  channel  is  not  determined 
during  the  calculation  of  the  rate-distortion  curve.  However, 
the  differential  pulse  code  modulation  (DPCM)  system  shown  in 
Fig.  2.2-4  can  achieve  performance  close  to  the  theoreti- 
cal bound  (Ref.  6).  The  DPCM  encoder  removes  information 
which  can  be  predicted  by  the  decoder  using  previously  trans- 
mitted information  and  a system  dynamics  model.  The  same 
dynamics  model  is  also  used  at  the  receiver  to  reconstruct 
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the  original  signal.  The  reconstruction  gain  used  in  both 
the  encoder  and  decoder  is  similar  to  the  optimum  gain  in  a 
Kalman  filter. 

The  complexity  of  the  Kalman  filter  used  for  the  op- 
timum data  prefilter  often  precludes  its  implementation. 
Therefore,  preprocessor  designs  which  approximate  the  opti- 
mum but  are  less  complex  are  discussed  in  the  next  section. 
Since  the  use  of  subopt imal  prefilters  introduces  additional 
distortion  in  the  data  transmission,  techniques  for  quanti- 
fying this  performance  degradation  are  al,so  discussed. 


2.3  SUBOPTIMAL  PREPROCESSOR'  DESIGN 

Instead  of  estimating  the  entire  state,  the  data 
prefilter  can  be  designed  to  estimate  only  a portion  of  the 
sensor  and  system  state  vector.  This  allows  a reduction  in 
the  complexity  of  the  preprocessor.  In  this  case,  }Ktn)  is 
designed  to  be  an  estimate  of  S(tn)  x(tn),  where  x(tn)  is  the 
system  state  vector  at  time  tn  and  S(tn)  is  a matrix  which 
selects  linear  combinations  of  the  states  for  estimation. 

For  example,  it  is  reasonable  to  choose  the  output  of  the 
prefilter  associated  with  a doppler  radar  receiver  to  be 
an  estimate  of  aircraft  velocity. 

One  of  the  simplest  prefilters  merely  averages  the 
raw  data.  This  type  of  prefilter  is  most  useful  when  the 
signal  can  be  assumed  to  be  essentially  constant  between 
data  transmission  times  and  when  measurement  errors  can  be 
assumed  to  be  white  noise.  When  these  assumptions  are  not 
justified,  more  sophisticated  prefiltering  may  be  required. 

The  minimum-variance  reduced-order  (MVRO)  esti- 
mator techniques  of  Ref.  7 provide  tools  for  designing 
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prefilters  which  minimize  the  variance  of  the  estimation  error 
subject  to  constraints  on  the  complexity  of  the  estimator. 
Assume  that  the  state  variable  representation  of  the  system 
is 


xUn+f)  = #(tn)  x(tn)  + w(tn)  (2.3-1) 


and  the  measurements  taken  between  t and  t can  .je  repre- 

n n+1 

sented  by 


z(tn)  * H(tn)  -^n^  + -(tn) 


(2.3-2) 


Then  a MVRO  estimator  for  the  system  is  described  by  the 
equations 


T«W  *<V  C<V  2(tn) 

+ T<W  *<V  K<V  *<V 


(2.3-3) 


S(tn)  C(tn)  -(tn) 

+ S(tn)  K(tn)  z(tn) 


(2.3-4) 


where 


n 7 
— <ln) 

£<V 

5(t„) 


the  raw  sensor  measurement  at  time  t 

n 

a memory  of  measurements  prior  to  tn 

the  compressed  data  vector  at  time  t 

z(tn)  - H(tn)  C(tn)  m(tn)  ( 2^3-5 ) 

the  innovations  for  the  reduced- 
order  estimator 


and  where  T(t  +^)  is  a matrix  of  design  parameters  which 

selects  which  portion  of  the  state  the  memory  vector,  m(tn+1) 

will  estimate.  The  matrices  C(t  ) and  K(t  ) are  determined 

n n 

completely  by  T(t  ) and  the  system  parameters.  The  innova- 

n 

tions  sequence  is  the  new  information  supplied  to  the 
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reduced-order  estimator  by  the  measurements.  The  innovation 
sequence  for  a Kalman  filter  is  white,  but  the  innovation 
sequence  for  a reduced-order  estimator  is  correlated. 


A diagram  of  a MVRO  estimator  for  the  system  is  shown 
in  Fig.  2.3-1.  Unfortunately,  the  computations  required  to 
determine  the  optimum  parameters  C(tn)  and  K(tn)  are  quite 
complex.  If  the  application  permits  these  parameters  to  be 
precomputed  and  stored,  then  the  MVRO  estimator  can  be  used 
as  the  prefilter.  When  this  is  not  possible,  heuristic  pre- 
filter designs  must  be  used.  However,  even  in  this  case, 
the  MVRO  estimator  design  procedures  can  provide  guidelines 
for  prefilter  design  and  a measure  of  how  well  a prefilter 
with  a given  state  selection  can  be  expected  to  perform. 
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In  general,  the  data  prefilter  can  be  described  by 
the  equations 


E(tn+1)  = A(tn)  m(tn)  + B(tn)  z(tn) 

JL(tn)  = S(tn)  C(tn)  m(tn) 

+ S(tn)  K(tn)  l(tn) 

l(t„)  - z(tn)  - G(tn)  m(tn) 

Figure  2.3-2  shows  the  block  diagram  for  this  representation 
of  the  data  prefilter.  Note  that  the  MVRO  prefilter  has  this 
structure  with 

A(tn>  * T<Vl>  ♦<*„>  C(tn)  (2.3-9) 

B(tn>  “ T<Vl>  *(»„)  K(t„)  (2.3-10) 

G(tn>  * H<V  C(t„) 


(2.3-6) 

(2.3-7) 

(2.3-8) 


l 


18 


When  a suboptimal  prefilter  is  used,  the  optimal 
encoder-channel-decoder  combination  can  be  determined  by 
considering  the  prefilter  as  part  of  the  sensor  dynamics. 

The  optimum  decoder  is  the  Kalman  filter  which  models  both 
the  system  and  prefilter  dynamics.  The  rate-distortion  func- 
tions for  a selection  of  candidate  prefilters  can  be  deter- 
mined using  the  methods  of  Appendix  A.  The  prefilter  whose 
rate-distortion  curve  lies  closest  to  the  "system  with  sen- 
sor" (optimal  prefilter)  rate-distortion  curve  at  the  de- 
sired transmission  rate  is  judged  to  be  best  (see  Fig.  2.3  3). 
In  some  cases,  the  optimum  encoder-channel-decoder  is  not 
practical  to  implement  because  it  involves  building  a Kalman 
filter  which  models  both  the  system  and  prefilter  dynamics. 

In  these  situations  either  the  MVRO  estimator  design  techniques 
of  Ref.  7 or  the  following  heuristic  procedure  can  be  used 
to  determine  an  encoder-channel-decoder  combination  which  is 
practical  to  implement: 


Typical  Rate-Distortion  Functions 


Figure  2.3-3 


w 


(1)  The  prefilter  is  put  in  the  form  of 
Fig.  2.3-2.  This  can  be  done  for  most 
linear  prefilters  (not  just  MVRO  pre- 
filters). 

(2)  The  innovations  sequence  is  replaced 
by  a white  noise  sequence  with  the 
same  variance.  (The  innovations  se- 
quence for  a Kalman  filter  is  already 
white . ) 

(3)  The  methods  of  Appendix  A are  used  to 
compute  the  encoder  compression  matrix 
H*  and  the  reconstruction  gain  matrix 
K* . 


This  encoder-channel-decoder  design  procedure  is 
justified  by  the  following  heuristic  arguments: 


The  optimum  prefilter  produces  a white 
innovation  sequence;  therefore,  a pre- 
filter which  is  "near  optimum"  should 
have  a "nearly  white"  innovations 
sequence . 

Replacing  the  correlated  innovations 
sequence  with  a white  sequence  can 
only  increase  the  required  data  trans- 
mission rate.  Therefore  the  heuristic 
design  technique  described  above  pro- 
vides an  upper  bound  on  the  trans- 
mission rate  required  to  achieve  a 
specified  distortion  with  a given 
prefilter. 

If  DPCM  encoding  is  used,  the  prefilter 
will  be  effectively  modeled  as  though 
its  innovations  sequence  were  white. 
Therefore,  this  heuristic  design  pro- 
cedure is  consistent  with  DPCM. 


When  the  output  of  the  prefilter  is  to  be  trans- 
mitted over  a digital  channel,  the  natural  choice  for  an 


i 

encoder  is  a differential  pulse  code  modulator  (DPCM)  which 
uses  the  reduced-order  system  dynamics  model  of  the  pre- 
filter. Therefore,  the  use  of  a reduced-order  prefilter 
results  in  a simplification  of  both  the  prefilter  and  the 
encoder . 


2.4  EXAMPLE  OF  PREPROCESSOR  DESIGN 

In  this  section  a simple  example  is  used  to  illus- 
trate the  techniques  which  can  be  employed  in  designing  a 
preprocessor.  The  example  problem  shown  in  Fig.  2.4-1  was 
chosen  to  emphasize  the  reduction  in  channel  capacity  which 
can  be  achieved  by  using  a data  preprocessor.  The  rms 
acceleration  was  taken  to  be 


R-l6Wo 


o^O)  = 500  ft 


(2.4-2) 


°2  (° ) 


50  ft/sec 


(2.4-3) 


The  rms  position  measurement  error  was  100  ft  and  the  rms  vel- 
ocity measurement  error  was  100  ft/sec  with  a correlation  time, 
x,  of  10  seconds.  This  example  can  be  considered  to  be  a 
highly  simplified  model  of  a dead-reckoning  navigation  system 
for  a highly-maneuverable  vehicle.  The  resulting  preprocessor 
may  be  similar  to  the  type  of  preprocessor  which  would  be 
used  with  a Global  Positioning  System  (GPS)  receiver.  Since 
this  system  is  unstable,  the  number  of  bits  required  to  trans- 
mit whole  values  of  position  and  velocity  grows  with  time; 
therefore,  some  type  of  preprocessor  (e.g.  a simple  dif- 
ferencing operation)  is  mandatory.  The  preprocessor  removes 
redundancy  in  the  data  and  permits  the  number  of  bits  re- 
quired at  each  transmission  time  to  be  bounded.  The  Kalman 
filter  is  the  optimum  prefilter  and  is  studied  first.  Sub- 

optimal  preprocessors  using  MVRO  prefilters  are  considered 
next . 


Using  a Kalman  prefilter  to  remove  redundancy  in  the 
measurements  and  using  a scheme  such  as  a Differential  Pulse 
Code  Modulation  (DPCM)  to  transmit  only  the  changes  in  the 
Kalman  filter  state  greatly  reduces  the  required  channel 
capacity.  The  techniques  discussed  in  Appendix  A were  used 
to  compute  the  distortion  introduced  by  constraints  on  the 
channel  capacity.  Figure  2.4-2  shows  the  results.  It  was 
assumed  that  the  steady-state  rms  position  error  was  the  only 
measure  of  distortion.  The  Kalman  pref ilter  has  a steady-state  rms 


0 


n 


200 


100 


10  TRANSMISSION -PER -SECOND 
/ENCODER 


/ 10  TRANSMISSIONS  OF 

1 BIT  EVERY  SECOND 


Z 

o 


cm 

o 


oo 

Q 


Q 

O 

u 

z 


0.1 


R - 16985 


1 TRANSMISSION -PER- SECOND 
ENCODER 


/ 

1/  1 TRANSMISSION  OF 
* 10  BITS  EVERY  SECOND 


1 10  100 
INFORMATION  TRANSMISSION  RATE  (bits/sec) 


Figure  2.4-2 


Additional  Preprocessor  Distortion 
vs  Transmission  Rate 


position  error  of  37.6  ft.  The  additional  distortion  introduced 
by  the  finite  capacity  of  the  preprocessor-to-data-coordinator 
communications  channel  is  plotted  in  Fig.  2.4-2.  The  total 
rms  position  error  is  obtained  by  root-sum-squaring  the  num- 
ber obtained  from  this  graph  with  the  prefilter  rms  error. 

Two  types  of  optimum  encoders  were  studied:  one  which  trans- 

fers a fixed  number  of  bits  every  0.1  second  and  one  which 
transfers  a fixed  number  of  bits  every  second.  The  informa- 
tion transmission  rate  (in  bits/second)  was  varied  and  the 
resulting  distortion  plotted. 
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Note  that  in  all  cases,  the  one-transmission-per- 
second  encoder  introduces  less  distortion  than  the  ten- 
transmissions-per-second  encoder  when  the  same  bit  rates  are 
used.  This  verifies  that  the  prefilter  is  indeed  performing 
data  compression. 

Figure  2.4-2  shows  that  a channel  capacity  of  four 
bits  per  second  used  in  conjunction  with  the  one-transmission- 
per-second  encoder  adds  a negligible  distortion  to  the  pre- 
filter distortion.  Therefore,  this  will  be  taken  as  the 
design  point.  Now  a practical  encoder  which  approaches  these 
optimum  characteristics  will  be  developed  for  a digital  chan- 
nel . 

In  Appendix  A,  it  is  shown  that  the  optimum  encoder- 
channel-decoder  combination  can  be  modeled  as  a Kalman  filter 
operating  on  a set  of  measurements 


* ~ * 

= H x + v 

n -n  — n 


(2.4-4) 


* . . * * 
where  v is  white  Gaussian  noise  with  variance  R and  H is 
n n 

computed  by  an  algorithm  described  in  Section  A. 3.  At  the  design 

point  chosen  for  this  example,  the  steady  state  value  of  H*  is 


H*  = Vr*  [0.1515  ft-1,  0,  0,  0] 


(2.4-5) 


A DPCM  system  is  chosen  for  the  encoder.  Since  four  bits  per 
second  sent  at  one-second  intervals  was  chosen  as  the  design 
point,  four  bits  or  16  levels  are  used  for  the  quantizer.  Re- 
sults from  Ref.  14  are  summarized  in  Fig.  2.4-3.  This  curve 
gives  the  ratio  of  the  per-transmission  rms  quantization  er- 

\H* 

ror  VR  to  the  quantization  level  % as  a function  of  the 
number  of  bits  used  in  each  transmission.  It  is  assumed  that 
equally  spaced  quantization  steps  of  optimum  length  are  used 
and  that  round-off  is  performed  rather  than  truncation.  For 
four  bits  of  quantization,  this  curve  shows  that 
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Note;  Equally  spaced  levels  of 

optimum  length  are  assumed 


ASYMPTOTE 


NUMBER  OF  BITS  OF  QUANTIZATION 


Ratio  of  rms  Quantization  Error 
to  Quantization  Level 


0. 320£ 
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The  optimum  gain  for  signal  reconstruction  is 


In  this  particular  case 


10. 3/£ 

10 . 9/£ 
5.25 /£ 

0 . 00367 / £ 


The  resulting  channel-encoder  and  channel-decoder 
are  shown  in  Figs.  2.4-4  and  2.4-5.  Note  that  the  dynamics 
models  used  do  not  include  the  dynamics  of  the  correlated 
velocity-measurement  noise  because  this  state  does  not  need 
to  be  estimated  by  the  channel-encoder  and  channel-decoder. 
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Figure  2.4-4  Example  Channel-Encoder 
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Figure  2.4-5  Example  Channel-Decoder 
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ure  2.4-6  compares  the  rate-distortion  functions  for  the  opti- 
mal prefilter  and  two  suboptimal  prefilters  using  optimal 
decoders.  One  prefilter  models  all  states  except  measurement 
correlation;  the  other  models  all  states  except  acceleration 
correlation.  It  can  be  seen  that  the  latter  performs  very  near 
the  optimum  and  so  would  be  the  better  choice  for  a suboptimum 
prefilter.  Of  course,  achieving  the  performance  shown  in 
Fig.  2.4-6  requires  the  use  of  an  optimum  decoder  which  models 

| 

both  the  system  states  and  the  prefilter  states.  However,  the 
MVRO  estimator  techniques  of  Ref.  7 can  be  used  to  design  simp- 
ler decoders  which  may  achieve  performance  near  that  shown  in 
Fig.  2.4-6. 


3. 


DATA  COORDINATOR  DES I GN 


3.1  OPTIMAL  DATA  COORDINATOR  DESIGN 

Once  the  preprocessor  structure  has  been  fixed,  a data 
coordinator  must  be  designed  to  combine  all  of  the  compressed 
data  and  produce  estimates  of  the  system  states  for  control 
and  display.  Assuming  a linear  system  driven  by  Gaussian 
noise  adequately  models  the  system,  then  a Kalman  filter  pre- 
serves all  relevant  information  about  the  state.  Therefore 
the  best  data  coordinator  is  a Kalman  filter.  But  it  is  the 

Kalman  filter  for  the  augmented  system  which  includes  the 
data  prefilter  dynamics. 

If  a single  sensor  with  an  optimum  prefilter  has 
been  used,  then  the  data  coordinator  design  becomes  particu- 
larly simple.  Since  the  innovations  sequence  in  the  optimum 
P ilter  (refer  to  Fig.  2.2-1)  is  a white  Gaussian  sequence, 
the  model  for  the  augmented  system  which  produces  the  prepro- 
cessed data  sequence  can  be  simplified  to  the  form  shown  in 
Fig.  3.1-1.  Furthermore,  if  differential  pulse  code  modula- 
tion (DPCM ) is  used  to  encode  each  data  channel,  then  the 
ptimum  data  coordinator  takes  the  form  shown  in  Fig.  3.1-2 
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Figure  3.1-1  Simplified  Model  of  Augmented  System 

with  Optimum  Preprocessor 
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Figure  3.1-2 


Optimum  Data  Coordinator  for 
Optimum  Prefiltering  and  DPCM 


If  non-optimal  prefiltering  is  used,  or  if  more  than 
one  sensor  is  in  operation,  then  the  optimum  data  coordinator 
takes  the  form  shown  in  Fig.  3.1-3.  In  cases  such  as  this,  im- 
plementing the  optimum  data  coordinator  may  require  consider- 
able computer  resources  and  would  therefore  defeat  one  of  the 
purposes  of  data  preprocessing.  Thus  a suboptimal  data  co- 
ordinator should  be  used.  In  the  next  section,  techniques  for 
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designing  suboptimal  data  coordinators  which  do  not  exceed 
computational  constraints  are  discussed. 


A heuristic  approach  to  data  coordinator  design  is 
to  consider  the  data  coming  from  a preprocessor  as  a measure- 


ment of  the  state 


* * 
H x + v 


where  H*  is  the  encoder  compression  matrix  described  in  Appen- 
dix A and  v*  is  assumed  to  be  a white  Gaussian  noise  composed 
of  prefilter  estimation  errors  and  data  transmission  noise. 

The  trouble  with  this  approach  is  that  the  measurement  noises 
are  in  general  not  white  and  may  be  correlated  between  channels. 


3.2  SUBOPTIMAL  DATA  COORDINATOR  DESIGN 


Usually  constraints  must  be  imposed  on  the  complex- 
ity of  the  data  coordinator  so  that  timing  and  computer 
capacity  restrictions  can  be  met.  The  optimum  data  coordin- 
ator requires  a dynamics  model  which  includes  system,  sen- 
sor, and  preprocessor  models  and  therefore  may  be  too  complex 
to  implement.  The  MVRO  estimator  design  techniques  of  Ref.  7 
can  be  used  to  develop  a data  coordinator  of  constrained  com- 
plexity. A MVRO  estimator  uses,  as  a memory,  a dynamics 
model  for  only  a portion  of  the  states.  Thus,  any  constraints 
on  complexity  can  be  met  by  a MVRO  estimator  with  a suitably 
chosen  memory.  For  a specific  selection  of  memory  states, 
the  MVRO  estimator  produces  the  most  accurate  estimate  of 
the  states  required  for  display  and  control.  In  some  cases, 
it  may  be  possible  to  use  a MVRO  estimator  as  the  data  co- 
ordinator; in  other  cases,  it  may  not  be  practical  to  imple- 
ment the  MVRO  estimator  due  to  the  complexity  of  the  compu- 
tations required  to  compute  the  optimum  estimator  parameters. 
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In  any  case,  the  MVRO  estimator  design  techniques  provide 
guidelines  for  designing  and  evaluating  data  coordinators. 
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3.3  DATA  COORDINATOR  EVALUATION 

Like  a data  preprocessor,  a data  coordinator  can  be 
evaluated  by  determining  how  much  information  about  the  state 
has  been  sacrificed  to  meet  complexity  constraints.  This 
sacrificed  information  is  reflected  by  an  increase  in  the 
mean-square  estimation  error.  Since  it  can  usually  be  as- 
sumed that  linear  dynamics  models  apply,  computer  programs 
which  have  been  developed  to  evaluate  reduced— order  estima- 
tors can  be  used  to  determine  the  least  squares  estimation 
error  of  a particular  data  coordinator-preprocessor  combina- 
tion. Note  that  the  data  coordinator  cannot  be  evaluated 
independently  of  the  data  preprocessors.  The  performance 
of  the  data  coordinator  depends  on  the  particular  preproces- 
sors used.  The  next  section  presents  techniques  which  can 
be  used  to  evaluate  a subopt imal  data  coordinator. 
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Although  the  software  used  in  the  data  coordinator 
depends  on  the  particular  sensors  and  preprocessors  used,  the 
hardware  design  of  the  data  coordinator-preprocessor  interface 
can  be  standardized.  This  is  because  the  bit-rates  and  trans- 
mission intervals  depend  mainly  on  the  general  type  of  sensor 
(e.g.  a velocity  sensor)  and  on  the  system  dynamics.  There- 
fore, electrical  and  mechanical  interfaces  can  be  standardized, 
and  only  software  changes  need  be  made  when  a sensor  is  changed, 
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A reduced-order  estimator  is  a data  processor  which 
uses  a memory  of  lower  order  than  that  required  for  optimal 
estimation.  In  a modular  estimator  design,  reduced-order 
estimators  may  be  used  for  prefilters  and  will  almost  cer- 
tainly be  used  for  the  data  coordinator.  The  performance  of 
these  reduced-order  estimators  depends  on  the  signals  they 
receive  as  inputs.  In  this  section,  the  sensitivity  of 
reduced-order  estimators  to  changes  in  their  input  signal 
structure  is  investigated.  The  investigation  consists  of 
the  following  steps: 


I 
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• A reduced-order  estimator  is  designed 
based  on  a fixed  model,  called  the  design 
model . 

• A different  model,  called  the  reference 
model , is  used  to  describe  a possible 
real  world  situation. 

• The  covariance  of  the  error  in  the  state 
estimate  is  computed  using  the  equations 
derived  in  this  report. 

• The  performance  of  the  reduced-order 
estimator  for  a class  of  reference  models 
is  compared  with  that  of  other  estimators 
and  with  performance  specifications. 


The  techniques  presented  in  this  section  can  be  used 
to  analyze  prefilter  designs  to  determine  their  sensitivity 
to  changes  in  the  system  model.  However,  the  most  important 
use  of  these  techniques  is  in  analyzing  the  performance  of 
the  data  coordinator  and  thereby  the  performance  of  the 
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complete  modular  estimator.  To  analyze  the  data  coordinator, 
the  reference  system  is  the  augmented  system  which  includes 
the  prefilters.  The  measurements  are  the  outputs  of  the 
encoder-channel-decoder  combination  and  so  take  the  form  of 
linear  combinations  of  the  augmented  states  corrupted  by  white 
Gaussian  noise.  The  prefilter  designs  are  assumed  fixed  and 
the  system  model  is  varied  to  determine  the  sensitivity  of  the 
data  coordinator.  If  the  sensitivity  is  too  great,  not  only 
the  data  coordinator  but  also  the  prefilters  may  need  to  be 
redesigned . 


4.2  ERROR  COVARIANCE  EQUATIONS 


Both  the  design  model  and  the  reference  model  take 
the  form  shown  in  Fig.  4.2-1.  The  system  state  is  assumed 
to  satisfy  the  linear  vector  difference  equation 
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Figure  4.2-1  Form  of  Design  and  Reference  Models 


5n+l  = *n  + »„  (4.2-1) 

where 

xn  is  the  state  at  time  n 
-n  is  a vector  of  zero  mean  white  noises 
and  $n  is  the  state  transition  matrix. 

The  measurements  z are  assumed  to  be  related  to  the  state  of 
the  system  by  the  linear  vector  equation 
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(4.2-2) 
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z = H x + v 
— n n— n — n 


where  v is  a vector  of  zero  mean  white  noises. 
— n 


and  H is  the  measurement  matrix, 
n 


The  state  estimate  is  assumed  to  take  the  form 


x | = Cm  + K v 

n n n— n n — n 


(4.2-3) 


where 


m 


v is  the  new  information  (innovations)  supplied 
“n  by  the  nth  measurement  z 


is  a memory  vector  (usually  of  reduced  order) 


— n 


Here  m is  assumed  to  satisfy  the  difference  equation 
— n 


m = A m + B v 
— n+1  n — n n — n 


(4.2-4) 


and  v is  given  by 


v = z - G m 
— n — n n — n 


(4.2-5) 


The  matrices  Ar , , Cn , Gn , and  Kn  completely  define  the 


reduced-order  estimator. 


The  prime  concern  of  estimator  performance  analysis 


is  to  determine  P , the  covariance  matrix  for  the  filtering 


error 


r\j 

X I 
— n n 


— n ' — n I n 


(4.2-6) 


However,  to  save  computations,  the  error  covariance  matrix  for 
the  one-step  prediction  error  is  computed  first  and  then  used 
to  compute  the  filtering  error  covariance  matrix.  The  one- 
step  prediction  error  is  defined  to  be 


-n+1 In  -n+1  Cn+1  -n+1 


(4.2-7) 


In  Appendix  B it  is  shown  that 
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(4.2-10) 


Furthermore,  Appendix  B shows  that  II  satisfies  the  difference 


equation 


IF,.-  - A n aJ  + A R A*  + 
n+l  □ n n n n n 


0 


0 0 


(4.2-11) 


where  the  matrices  A and  II  are  defined  in  Appendix  B. 

n n 


Therefore,  the  computation  of  the  error  covariance  in- 
volves two  steps . 


(1)  The  recursion  equation  for  nn  is  mechan- 
ized. A formula  for  the  starting  value 
nG  is  given  in  Appendix  B. 


(2)  The  estimation  error  covariance  matrix 
is  computed  as  needed. 


In  the  next  section,  some  simple  examples  are  used  to 
illustrate  the  types  of  sensitivity  analysis  studies  which  may 
be  performed  using  these  error  covariance  equations. 
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4 . 3 EXAMPLES 

The  tracking  problem  illustrated  in  Fig.  4.3-1  was 
chosen  as  the  basis  for  all  the  examples  in  this  section. 

This  tracking  problem  can  be  viewed  as  a highly  simplified 
model  of  a dead-reckoning  navigation  system  for  a highly- 
maneuverable  vehicle.  The  acceleration  x^  is  assumed  to  be 
a correlated  Gaussian  noi,  e process.  The  velocity  x2  results 
from  integrating  the  acceleration,  and  the  position  x^  is 
simply  the  integral  of  velocity. 

The  rms  value  of  the  acceleration  process  was  taken 

to  be 

— 1 OA  / 2 . _ _ 


to  be 


= 120  ft/sec 


(4.3-1) 
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Figure  4.3-1  Tracking  Problem 

The  rms  value  of  the  initial  position  uncertainty  was  assumed 
to  be 


c1(0)  = 500  ft 


(4.3-2) 


and  the  rms  value  of  the  initial  velocity  uncertainty  was 
chosen  to  be 


o2(0)  = 50  ft/sec 


(4.3-3) 


Different  examples  were  created  by  assuming  different  measure- 
ment mechanisms.  In  all  examples,  measurements  were  assumed 
to  be  taken  at  0.2  second  intervals. 


AS  a first  example,  It  was  assumed  that  a measure- 
rs^ ! aPOSltl°n  "S  aVailable'  This  measurement  was  cor- 
upted  by  an  additive,  white,  Gaussian  noise  v with  100  ft 
standard  deviation  . n wlxn 

to  var i a t ' The  sensitivity  of  estimator  performance 

variations  in  the  acceleration  correlation  time  x'  were 
investigated.  A two-state  estimator  which  used  a memory  of 
position  and  velocity  only  was  designed  using  the  MVRO 
0 server~estlmator  techniques  with  the  design  value  of  T- 
equal  to  five  seconds.  The  sensitivity  of  this  estimator 

. COm^are  to  that  of  a three-state  estimator  designed  for 
T equal  to  five  seconds.  Figure  4.3-2  shows  a comparison 
the  rms  position  error  at  25  seconds. 
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Figure  4.3-2 


in"liti!'ity.0'  Estimator  to  Change* 
in  Acceleration  Correlation  Time 


for  the  „SlnCS  the  three'state  estimator  is  the  Kalman  filter 

eslgn  mode1'  the  three  state  estimator  performs  bet 
ter  than  the  two-state  estimator  at  the  de  • . 

ever  , at  the  design  point.  How- 

ver,  as  the  acceleration  correlation  time  in  the  reference 
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model  is  decreased,  the  two  curves  cross.  For  acceleration 
correlation  times  between  0.1  and  1.0  seconds,  the  two-state 
estimator  performs  better  than  the  three-state  estimator. 
Furthermore,  the  peak  of  the  two-state  estimator  curve  is 
lower  than  the  peak  of  the  three-state  estimator  curve.  There- 
fore, considering  worst  case  performance  for  the  acceleration 
correlation  time  in  the  range  shown  in  Fig.  4.3-2,  the  two- 
state  estimator  is  preferable  to  the  three-state  estimator. 

Of  course,  this  does  not  mean  that  the  two-state  estimator 
tested  here  is  preferable  to  all  three-state  estimators.  It 
means  that,  for  the  design  model  chosen,  the  two-state  esti- 
mator is  less  sensitive  to  errors  in  modeling  the  accelera- 
tion correlation  time.  When  the  acceleration  correlation 
time  is  shorter  than  expected,  the  performance  of  the  three- 
state  estimator  degenerates  because  it  relies  too  heavily  on 
its  memory  of  acceleration. 


A second  set  of  examples  was  studied  using  the  same 
basic  system  model  modified  as  shown  in  Fig.  4.3-3.  The 
acceleration  correlation  was  time  fixed  at  5.0  seconds,  and  in 
addition  to  the  position  measurement,  a velocity  measurement 
was  taken  every  0.1  seconds.  This  velocity  measurement  was 
assumed  to  be  corrupted  by  a zero-mean,  Gaussian  measurement 
noise  with  a standard  deviation  of  100  ft/sec  and  correlation 
time  t.  An  additional  state  must  be  added  to  the  system 
model  to  account  for  this  correlated  measurement  noise.  There- 
fore, a four-state  estimator  is  required  to  implement  the  Kal- 
man filter  for  a given  design  model  of  this  system.  Two  de- 
sign points  were  chosen:  x = 0.1  seconds  and  x = 10  seconds. 
The  performance  of  the  resulting  four-state  estimators  was 
compared  to  that  of  two  corresponding  three-state  estimators 
designed  using  MVRO  techniques. 
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Figure  4.3-3  Tracking  Problem  with  Velocity 

and  Position  Measurements 


The  three-state  estimators  used  a memory  of  position, 
velocity  and  acceleration,  but  did  not  attempt  to  "remember" 
the  velocity  measurement  error.  Figure  4.3-4  shows  the  var- 
iation in  steady  state  rms  position  error  for  these  estima- 
tors as  the  correlation  time  of  the  velocity  measurement 
error  is  changed  in  the  reference  model.  Figure  4.3-5  shows 
the  variation  of  rms  velocity  error.  If  it  is  believed  that 
the  velocity-measurement-error  correlation  time  may  vary  over 
the  indicated  range  of  values,  then  the  estimator  whose  per- 
formance curve  has  the  lowest  peak  is  most  desirable.  Adopt- 
ing this  criterion,  the  four-state  estimator  with  design 
x = 10.0  seconds  has  the  best  position  error  performance.  How- 
ever, the  three-state  estimator  with  design  x = 10.0  performs 
very  nearly  as  well  in  position  error.  Furthermore,  the  lat- 
ter estimator  performs  much  better  in  terms  of  velocity  er- 
ror. Therefore,  the  three-state  estimator  with  design  x = 

10.0  seconds  would  probably  be  the  most  desirable  of  the 
four  estimators  tested. 
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Other  conclusions  can  be  drawn  from  these  curves. 

For  example,  consider  the  curves  for  the  four-state  estimator 
with  design  x = 10.0  seconds.  Note  that  a mismodeling  of  the 
velocity-measurement-error  correlation  times  does  not  greatly 
affect  the  position  error  performance  and  that  in  all  cases 
the  position  error  performance  is  superior  to  the  perform- 
ance without  the  velocity  measurement.  However,  it  can  be 
seen  that  mismodeling  does  greatly  affect  the  velocity  er- 
ror performance.  In  fact,  for  correlation  times  much  shorter 
than  the  design  correlation  time,  the  velocity  error  perform- 
ance is  worse  than  if  no  velocity  measurement  were  taken. 

Looking  now  at  the  performance  of  the  four-state 
estimator  with  design  t * 0.1  seconds,  the  affect  of  mismod- 
eling on  velocity  error  is  rather  large,  but  in  all  cases 
taking  the  velocity  measurement  does  improve  performance. 
However,  the  position  error  performance  for  this  estimator 
can  be  significantly  degraded  by  taking  the  velocity  measure- 
ment if  the  correlation  time  is  mismodeled. 

For  the  three-state  estimator  with  design  x = 0.1 
seconds,  both  position  and  velocity  error  performance  are 
degraded  by  taking  velocity  measurements  and  mismodeling  the 
measurement-error-correlation  time.  Of  the  four  estimators 
considered  here,  only  the  three-state  estimator  with  design 
t = 10.0  seconds  uses  velocity  measurements  to  improve  both 
position  and  velocity  error  performance  even  when  the  meas- 
urement error  correlation  time  is  badly  mismodeled. 

It  should  be  noted  that  the  minimax  design  procedures 
of  Refs.  15  and  16  can  be  used  in  some  cases  to  determine  the 
estimator  which  has  the  least  sensitivity  to  modeling  errors 
of  any  estimator  using  the  same  order  memory  as  the  system. 
Unfortunately,  these  procedures  cannot  be  feasibly  implemented 
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for  these  examples,  and  have  not  yet  been  extended  to  reduced- 
order  estimators.  The  last  example  seems  to  indicate  that 
some  three-state  estimator  may  be  less  sensitive  to  modeling 
errors  than  any  four-state  estimator.  However,  the  results 
presented  here  are  certainly  not  conclusive. 
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5. 


SUMMARY 


In  the  past,  navigation  data  processing  has  typically 
been  performed  in  a central  data  processor,  as  illustrated  in 
Fig.  5-1.  The  development  of  microprocessors  and  other  Large 
Scale  Integration  (LSI)  devices  has  made  a new  data  processing 
structure  feasible.  Called  modular  estimation  in  this  report, 
this  data  processing  structure  is  composed  of  data  preprocessors 
located  at  the  sensors  and  a data  coordinator  located  with  the 
user,  as  shown  in  Fig.  5-2. 
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Figure  5-1  Centralized  Navigation  Data  Processor 
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Figure  5-2 


Modular  Navigation  Data  Processor 


Current  trends  toward  integrating  all  avionics  in  a 
coordinated  aircraft  information  system  make  modular  estima- 
tors desirable  for  the  following  reasons: 

(1)  Computer  Capacity  - Data  preprocessors 
operate  in  parallel  with  the  data  coordin- 
ator. This  increases  the  computational 
capacity  of  the  system  without  requiring 
an  increase  in  the  speed  of  the  data  co- 
ordinator; and  without  introducing  some 

of  the  timing  problems  usually  associ- 
ated with  real-time  data  processing. 

(2)  Executive  Program  Complexity  - Since 
data  can  be  stored  in  the  preproces- 
sors,  many  of  the  complex  timing  prob- 
lems and  executive  interrupt  structures 
usually  associated  with  real-time  data 
processing  can  be  avoided. 

(3)  Channel  Capacity  - Since  data  compres- 
sion occurs  at  the  sensor,  the  capacity 
required  for  the  communication  channel 
feeding  the  data  coordinator  can  be 
greatly  reduced. 

(4)  Modularity  and  Flexibility  - The  inter- 
faces  between  the  data  coordinator  and 
certain  functional  types  of  sensor/ 
preprocessor  packages  can  be  fixed. 

Thus,  a velocity  reference  could  replace 
any  other  velocity  reference  without  af- 
fecting the  hardware  design  of  any  other 
part  of  the  system. 

(5)  System  Reliability  - Preprocessors  can  be 
designed  in  such  a way  that  a failure  of 
a single  component  will  still  leave  the 
system  operational. 


This  report  presents  methods  of  designing  and  eval- 
uating modular  estimators.  An  example  is  used  to  illustrate 
the  techniques  which  can  be  employed  in  designing  a prepro- 
cessor. The  sensitivity  analysis  techniques  required  to 
evaluate  the  performance  of  the  complete  modular  estimator 
are  also  illustrated. 
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This  design  procedure  can  solve  many  of  the  problems 
encountered  in  developing  a modular  estimator.  For  example, 
with  a data  bus  system  like  the  Digital  Avionics  Information 
System  (DAIS),  this  design  procedure  can  be  used  to  answer 
questions  such  as  the  following: 

• How  many  bits  per  second  should  be 
allocated  for  each  sensor? 

• How  often  should  sensor  data  be  put  on 
the  bus? 

• What  type  of  prefilter  should  be  used 
with  each  sensor? 

• How  should  sensor  data  be  quantized  and 
coded  for  transmission? 

• What  type  of  algorithm  should  be  used  to 
coordinate  the  data  coming  from  the  sensors? 
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RELEVANT  INFORMATION  THEORY  CONCEPTS 
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A.l  MODULAR  ESTIMATION  AND  COMMUNICATION  SYSTEM  DESIGN 

In  some  respects,  modular  estimator  design  can  be 
viewed  as  a communication  system  design  problem;  therefore 
several  concepts  which  are  used  extensively  in  communication 
system  design  can  be  applied  to  the  analysis  and  design  of 
modular  estimation  algorithms.  This  section  explores  the 
connections  between  modular  estimation  and  communication 
system  design.  Subsequent  sections  present  important  infor- 
mation theory  concepts  and  discuss  their  application  to 
modular  estimator  design. 

The  classical  communication  system  configuration  is 
shown  in  Fig.  A. 1-1.  Messages  are  produced  by  the  source 
and  are  to  be  transmitted  over  the  channel  to  the  user.  An 
encoder  is  used  to  translate  the  messages  to  a form  which 
can  be  transmitted  over  the  channel;  a decoder  performs  the 
inverse  translation  back  to  a form  which  the  user  can  in- 
terpret. During  message  transmission,  the  channel  introduces 
errors;  so  the  received  message  x differs  from  the  trans- 
mitted message  x.  A distortion  measure  p(x,x)  is  used  to 
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Figure  A. 1-1  Classical  Communication 

System  Configuration 
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measure  the  significance  of  this  deviation.  By  properly  de- 
signing the  encoder  and  decoder  to  match  the  source  and  chan- 

/\ 

n'eT,  the  expected  distortion  E [p(x,x)]  can  be  minimized. 

Usually  the  communication  system  design  can  be  par- 
titioned as  illustrated  in  Fig.  A. 1-2.  By  separating  the 
encoder  into  a source  encoder  and  a channel  encoder  and  the 
decoder  into  a channel  decoder  and  a source  decoder,  the 
design  procedure  can  be  factored  into  a phase  which  depends 
primarily  on  the  channel  characteristics  and  a phase  which 
depends  primarily  on  the  source  characteristics.  The  design 
of  the  source  encoder-decoder  combination  will  be  emphasized 
here;  channel  encoder-decoder  design  will  not  be  discussed 
in  detail.  The  only  relevant  characteristic  of  the  channel 
will  be  its  capacity  which  will  be  defined  later. 
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The  modular  estimation  problem  can  be  cast  in  the 
communication  system  configuration  shown  in  Fig.  A. 1-3.  In 
the  estimation  problem,  the  encoder  is  composed  of  the  sen- 
sor and  the  preprocessor.  Therefore  part  of  the  encoder 
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design,  the  sensor,  is  usually  fixed  and  not  subject  to  opti- 
mization. This  requires  some  minor  modification  of  standard 
information  theory  methods.  The  details  of  this  development 
will  follow.  First  relevant  terms  from  information  theory 
will  be  defined.  More  complete  discussions  of  these  con- 
cepts are  contained  in  Refs.  8 through  10. 


A. 2 DEFINITION  OF  TERMS 
A . 2 . 1 Entropy 

The  entropy  of  a random  variable  is  a measure  of  the 
uncertainty  in  the  value  which  it  may  assume.  For  a discrete 
random  variable  x which  may  assume  k values  , a2,...,  afe 
with  the  probabilities  pCa^,  p(a2>,...,  p(ak>  the  entropy  is 
defined  to  be 


H(x  ) 


E [-log  p(x)] 
k 

- E p(ai)  log  p(ai) 
i=l 


(A. 2-1) 


If  y is  another  variable  taking  n values  , bg,...,^ 
which  is  jointly  distributed  with  x,  then  the  entropy  of  the 
joint  distribution  is 


H(x,y) 


E [-log  p(x  , y )] 
k n 

- E E P(a.  ,b.)  log  p(a . ,b  . ) 
i='l  j = l 3 3 

(A. 2-2) 


where  p(a.,b.)  is  the  probability  that  x takes  the  value  a^ 

J 

and  y takes  the  value  b..  The  conditional  entropy  of  x given 

J 

y is  defined  to  be  the  expected  value  of  the  entropy  of  the 
conditional  distribution  for  x given  y.  That  is 


n k 

H(x | y)  * - E p(b.)  E p(a,|b.)  log  p(a.|b.)  • 
j*=l 1 *  3 i=l  1 3 J 

* E [-log  p ( x | y )] 


(A. 2-3) 

where  pCa^lbj)  is  the  probability  that  x takes  the  value  a^ 
given  that  y has  the  value  b..  The  conditional  entropy  mea- 
sures  the  average  uncertainty  about  the  value  of  x which  re- 
mains after  the  value  assumed  by  y is  known. 

The  entropy  function  has  a number  of  properties 
which  justify  its  interpretation  as  a measure  of  uncertainty. 

(1)  H(x)  is  a function  of  the  probabili- 
ties p(a^),  p(a2),  ...,  pCa^)  only  and 

not  a function  of  the  values  which  x 

may  assume.  The  uncertainty  depends 


only  on  the  probabilities  of  the  alter- 
natives and  not  on  any  other  character- 
istic of  the  alternatives. 

(2)  H(x ) is  a continuous  function  of  the 
pCa-j^'s.  A small  change  in  the  prob- 
aoilities  produces  only  a small  change 
in  the  uncertainty. 

(3)  For  fixed  k,  H(x)  achieves  its  maximum 
when 

p(a.)  = f-  i = 1,2,...,  k (A. 2-4) 

l k 

Uncertainty  is  maximum  when  all  possible 
alternatives  are  equally  likely. 

(4)  Adding  an  additional  value  ak+1  which  x 
could  assume  with  probability  zero  does 
not  change  the  entropy.  The  addition  of 
zero  probability  alternatives  does  not 
change  the  uncertainty. 

(5)  H(x , y ) = H(y ) + H(x|y)  (A. 2-5) 

The  uncertainty  in  x and  y considered 
jointly  is  equal  to  the  uncertainty  in  y 
alone  plus  the  uncertainty  in  x given  y. 


In  Ref.  11,  Khinchin  shows  that  these  properties  can 
be  taken  as  axioms  and  the  form  of  the  entropy  function  will 
be  deduced  to  be 

k 

H(x)  = -X  £ p( a . ) log  p ( a. ) (A. 2-6) 

i=l  1 


where  X is  an  arbitrary  multiplying  constant.  Therefore  these 
properties  uniquely  specify  the  form  of  the  entropy  function 
to  within  a constant  multiplying  factor.  Changing  the  multi- 
plying factor  effectively  changes  the  base  of  the  logarithm 
used.  Therefore,  any  choice  of  logarithmic  base  is  valid  as 
long  as  the  same  base  is  used  consistently. 

The  entropy  concept  can  be  extended  to  continuous 
random  vectors  by  the  techniques  explained  in  Ref.  12.  The 
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entropy  of  a random  vector  x with  probability  density  func 
tion  p(x)  is  defined  to  be 


H(x ) = E [-log  p(x)] 


- J p(x)  log  p(x)  dx 


(A. 2-7) 


If  x and  y_  have  the  joint  density  function  p(x,y)  and 

the  conditional  density  function  p(x|^),  then  the  joint  entropy 

¥ 

of  x and  y is 

H (x,y_)  = E [-log  p(x,£)] 


■ If  P(x,y.)  lQg  P(x>Z)  d*  dZ 


— 00  —00 


(A. 2-8) 


and  the  conditional  entropy  of  x given  y is 


H(x|jr)  = E[-log  p(x|X)] 


00  00 

- f -/p(s 

MM  00  —00 


lx)  log  p(xlx)  dx  p(jr)  d£ 


(A. 2-9) 


All  of  the  properties  of  the  entropy  function  extend  to  random 
vectors  with  continuous  distributions. 

The  entropy  of  a Gaussian  random  vector  is  of  parti- 
cular interest.  Since  the  density  function  of  a Gaussian 
vector  of  dimension  n with  mean  y and  covariance  P is 


p(x)  = 


(2u)n/2(det  P)1/2 


exp  [ - | (x-y)T  P-1(x-y)J 


(A. 2-10) 
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The  entropy  of  x is 

H(x)  = El-log  p(: 


I [-log  p(x)J 

^ log  2tt  + i log  det  P 
+ i E [(x-m)T  P-1  (x-m)]  log 


(A. 2-11) 


Noting  that 


E [(x-m)T  P-1(x-m)]  = tr|p-1  E [(x-m) (x-m)T]|> 

(A. 2-12) 


= n 


gives 


H(x ) = ^ log  2tt  e + ^ log  det  P 


(A. 2-13) 


A. 2. 2 Average  Mutual  Information 


The  amount  of  information  conveyed  about  a random 
variable  by  observing  another  random  variable  is  measured  by 
the  average  mutual  information.  For  random  vectors  x and 
with  continuous  distributions,  the  average  mutual  information 
is  defined  to  be 


Kx;z) 

It  can  be  shown  that 


E 


£n 


P(x,l) 
P(x)  p(y ) 


(A. 2-14) 


i(x;z) 


H( x ) - H(x|Z) 

H(x)  + H(y ) - H(x,£) 
H(z)  - H(y | x) 


(A. 2-15) 
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Thus  the  average  mutual  Information  can  be  Interpreted  as 
uncertainty  In  x minus  the  average  uncertainty  in  x given 
knowledge  of  the  value  assumed  by  £,  i.e.,  the  uncertainty 
in  x resolved  by  the  knowledge  of  £.  This  interpretation 
also  applies  if  the  roles  of  x and  £ are  interchanged;  there- 
fore, this  function  is  called  the  average  mutual  information. 

The  average  mutual  information  of  Gaussian  vectors 
and  random  processes  are  of  particular  interest  for  estima- 
tion problems.  Reference  12  proves  the  following  results. 

(1)  Let  x and  £ be  jointly  Gaussian  random 
vectors,  with  joint  covariance  matrix 


dt  B 


(A. 2-16) 


Then 


I(x;£)  = - | log  det  (I  - DT  A 1 D B ) 

- - i log  det  (I  - D B'1  DT  a-1) 
2 


(A. 2-17) 


( 2 ) If  yk  = {y H , 0 £ j £ k}  is  a Gaussian 

random  process  and  x is  a Gaussian  random 
vector,  then 

nx;yk)  = 4 log  det  e[x  xT]-  | log  det  Pj 


(A. 2-18) 


where 


Ej^x-Xk)^'£k)T] 

E [x|yk] 


= minimum  variance  estimate  of 
x given  the  values  of  y 


. . j 


Statistically  independent  problems  can 
be  treated  separately. 


(4)  If  £ is  a vector  and  f(*)  is  a compat- 
ible function,  then 

I(x;z)  ^ I(x;f(^))  (A. 2-22) 

A transformation  of  an  observation  can- 
not increase  the  information. 


The  most  important  applications  of  the  average  mutual 
information  are  in  determining  channel  capacity  and  the  rate 
distortion  function  of  a source.  These  concepts  are  defined 
in  the  next  sections. 


A. 2. 3 Channel  Capacity 

In  designing  a communication  system,  it  is  useful  to 
know  the  ultimate  information  carrying  capacity  of  a channel 
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with  the  optimum  encoder-decoder  combination.  The  average 
mutual  information  can  be  used  to  define  such  an  informa- 
tion capacity  measure.  Let  c be  the  input  to  the  channel  and 
let  c be  the  output.  The  source-encoder  combination  which 
produces  c determines  the  probability  density  function  p(c), 

A 

but  the  channel  determines  the  conditional  density  p(c|c). 

The  average  amount  of  information  carried  by  the 
channel  is  measured  by  the  mutual  information  between  c and 

/N  — - 

c which  is 


Kc;c)  - // 


p(c|c)  p(c)  log 


p(c| c) 
p(c) 


where 


00 

:c)  = J p(c|c)  p(c)  dc 


dc  dc 


(A. 2-23) 


(A. 2-24) 


The  channel  capacity  is  determined  by  varying  p(c)  to  maxi- 
mize I(c;c).  That  is, 


C = max  I ( c ; c ) 
P(c) 


(A. 2-25) 


Thus  the  channel  capacity  determines  the  maximum  amount  of 
mutual  information  which  can  be  achieved  if  the  encoder  is 
designed  to  present  the  optimum  message  structure  to  the  chan- 
nel. 


1 


A. 2. 4 The  Rate-Distortion  Curve 

For  a given  source,  each  encoder-channel-decoder  com- 
bination will  have  an  information  transfer  rate  which  is  less 
than  or  equal  to  the  channel  capacity.  Another  measure  of 
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the  performance  of  the  encoder-channel-decoder  combination 

/\ 

is  distortion  E[p(x,x)]  where  x is  the  message  which  was 
transmitted,  x is  the  estimate  received  by  the  user,  and 
p(x,x)  is  a metric  such  as 

P(x,x)  = I!  x - x ||  2 (A. 2-26) 

For  a specified  source,  the  distribution  of  messages 
specified  by  p(x)  is  fixed.  By  adjusting  the  encoder-channel- 
de coder  combination , the  conditional  probability  density  func— 
tion  P (x I x ) can  be  manipulated.  Each  choice  of  p(x|x)  deter- 
mines an  information  transfer  rate,  measured  by  I(x;x),  and 
a distortion  measured  by  E[p(x,x)J  . This  corresponds  to  a 
point  in  the  rate-distortion  plane.  See  Fig.  A. 2-1.  The 
lower  right  boundary  of  the  region  of  possible  performance 
combinations  is  called  the  rate-distortion  curve  (or  rate 
distortion  function).  The  rate-distortion  curve  for  the 
source  is  determined  by  varying  p(x|x)  to  minimize  I(x,x) 
subject  to  a distortion  constraint 
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E[p(x,x)]  = d 


(A. 2-27) 


Thus  the  rate-distortion  curve  specifies  the  minimum  channel 
capacity  required  to  maintain  a specified  distortion.  Con- 
versely, the  rate-distortion  curve  also  specifies  the  mini- 
mum distortion  which  can  be  achieved  with  a fixed  channel 


capacity . 


The  rate-distortion  function  is  defined  by  a con- 
strained minimization  problem  which  can  be  solved  using 
Lagrange  multiplier  and  calculus  of  variations  methods.  In 
Refs.  8 through  10,  it  is  shown  that  the  minimization  pro- 


duces 


p(x|x)  = 6(x)  p(x ) esp ^ — ^ 


(A. 2-28) 


where  8(x)  and  s act  as  Lagrange  multipliers.  Here  B(x)  is 


chosen  to  make 


/A  /\ 

p(x | x)  dx  = 1 


for  every  x,  and  s is  chosen  to  produce  a distortion  of 


E[p(x,x)]  = d 


The  minimum  rate  corresponding  to  this  distortion  is  the  result- 
ing average  mutual  information,  which  is 


R(d ) = I(x ; x)  = sd  + 


/ P<2 


) Jin  8(x)  dx 


(A. 2-29) 


An  additional  constraint  is  imposed  implicitly  because  the 
average  mutual  information  can  never  be  negative.  Therefore, 
for  any  compatible  matrix  A,  the  mutual  information  between 
Ax  and  Ax  must  be  non-negative. 


1 


; 
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E[p(x,x)]  = d 


(A. 2-27) 


Thus  the  rate-distortion  curve  specifies  the  minimum  channel 
capacity  required  to  maintain  a specified  distortion.  Con- 
versely, the  rate-distortion  curve  also  specifies  the  mini- 
mum distortion  which  can  be  achieved  with  a fixed  channel 
capacity . 

The  rate-distortion  function  is  defined  by  a con- 
strained minimization  problem  which  can  be  solved  using 
Lagrange  multiplier  and  calculus  of  variations  methods.  In 
Refs.  8 through  10,  it  is  shown  that  the  minimization  pro- 


duces 


P ( x | x ) = 6 (x ) p ( x ) esp(— ^ 


(A. 2-28) 


where  B(x)  and  s act  as  Lagrange  multipliers.  Here  8(x)  is 


chosen  to  make 


/A.  A 

p(x | x)  dx  = 1 


for  every  x,  and  s is  chosen  to  produce  a distortion  of 

A 

E[p(x,x)]  = d 

The  minimum  rate  corresponding  to  this  distortion  is  the  result- 
ing average  mutual  information,  which  is 


R(d)  = I(x;x)  = sd  + / p(x)  In  B(x)  dx 


(A. 2-29) 


An  additional  constraint  is  imposed  implicitly  because  the 
average  mutual  information  can  never  be  negative.  Therefore, 
for  any  compatible  matrix  A,  the  mutual  information  between 

A 

Ax  and  Ax  must  be  non-negative. 


-ML-  ’ „ . 7 J96  JX...H 
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If  the  distortion  measure  depends  only  on  the  differ- 

/\  A 

ence  between  x and  x,  then  it  can  be  written  as  p(x  - x)  and 


p(x|x)  = B (x ) p(x)  esp(- 


(A. 2-30) 


Now  for  every  x,  this  must  be  a probability  density  function. 


J~  p(x | x ) dx  = 1 


(A. 3-31) 


for  all  x.  If  p(x  - x)  is  nonzero  for  all  nonzero  values  of 

A 

(x  - x),  this  implies  that 


p ( x ) B(x)  = a 


(A.  3 - \2) 


must  be  a constant. 


A. 3 RATE -DISTORT I ON  CURVE  FOR  A GAUSSIAN 
VECTOR  WITH  QUADRATIC  LOSS 

An  important  specie  1 case  results  when  the  source 
distribution  is  Gaussian  with  mean  zero  and  covariance  matrix 

jje 

P and  the  distortion  measure  is  the  quadratic  form 


p(x  - x)  = ||  W(x  - x) 


(A. 3-1) 


A m m A 

= (x  - x)  W W(x  - x) 


where  W is  a weighting  matrix.  It  will  be  shown  here,  that  the 
encoder-channel-decoder  combination  which  minimizes  the  required 
channel  capacity  with  a specified  distortion  can  be  acheived  by 
first  taking  a measurement 


* * 
H x + v 


(A. 3-2) 


and  then  setting 


f’’’" 


where 


A , 

* * 

x = K z 


(A. 3-3) 


v is  a zero  mean  Gaussian  noise  with  covar- 
iance matrix  R* 

* 

H is  the  measurement  matrix  (which  will  be 
specified  later  in  this  section) 


K*  = pVT[hV„’T  + R*] 


(A. 3-4) 


To  see  this,  first  define 


y = D W x 


(A. 3-5) 


where  D is  an  orthogonal  matrix  which  diagonalizes  W P*  WT  to 
produce  the  matrix  of  eigenvalues 


(A. 3-6) 


Now  x can  be  expressed  as 
x = a + M y 


(A. 3-7) 


where  a and  Z are  independent.  (This  is  a result  of  the  pro- 
jection theorem.)  The  estimate  can  also  be  written  as 


x = a + M y. 

where  a and  y are  independent 


(A. 3-8) 


The  average  mutual  information  between  x and  x is 
I(x;x)  = I(a;a)  + I(^;y_)  (A. 3-9) 

The  distortion  measure  depends  only  on  (y  - y);  i.e< 
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p(x  - x)  = (X  - x)T  WT  DTD  W(x  - x) 

^ m a 

= (z  - z)  (z  - z) 


(A. 3-10) 


Therefore,  the  average  mutual  information  can  be  minimized  by 
setting  a = 0 so  that 


I ( a ; a ) = 0 


(A. 3-11) 


and  by  then  minimizing  I(y;y).  Since  p(y  - y)  is  nonzero  for 
all  nonzero  values  of  (y  -^y),  minimizing  the  average  mutual 
information  between  y and  y produces 


P(zlz)  = a exp  s(y  - Z)T  (y  _ y) 
where  a is  a normalizing  constant  chosen  to  make 

00  S\ 

f P(zlz)  dy  - 1 


(A. 3-12) 


(A. 3-13) 


Thus  y given  y has  a normal  distribution  with  coa 
— 2s  ^ ’ Requiring  a distortion  equal  to  d gives 

E[p(x,x)]  * tr  e[(Z  - Z)(Z  _ Z)T] 


covariance  matrix 


00  00 

r|  J /(Z-Z)(Z-Z)T«exp[-  |(Z-Z)T(-2s  I )(Z-£)1  dZp(^) 
' —00  —00  “ 

r{  - h 1 I p<z>  oif 


where  n is  the  dimension  of  y_.  Therefore, 


s = - 


and  y given  y has  the  covariance  matrix  — I 

n 


(A. 3-14) 


(A. 3-15) 
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The  rate-distortion  curve  is  given  by  the  relation- 

r°  = I(Z>Z>  = H(y)  - H(y | y ) 

= i log  det  A - log  det  i (A. 3-16) 

- 1 s h xi  - 106  £ 1 

i=l  J 


This  rate  applies  if  a negative  mutual  information  does  not 
occur  in  any  direction  A £.  Since  the  components  of  £ are 
independent  Gaussian  variables,  the  requirement  is 

d° 

log  A.  - log  — > 0 for  every  i (A. 3-17) 


or 


d°  £ nA^  for  every  i (A. 3-18) 

When  nA^  < d°  for  some  i,  then  no  new  information  is  required 
to  estimate  that  component;  the  corresponding  component  of  the 
unconditional  mean  is  used.  Suppose  that  the  eigenvalues  of 
W P*  W are  arranged  in  descending  order  of  magnitude.  As 
long  as  d°  £ n^n>  then 


r 


o 


£ i°g  A.  - 
■«  — *1 


(A. 3-19) 


For  d°  > nA^,  the  problem  becomes  an  n-1  dimensional  problem 

with  distortion  constrained  to  be  d°  - X . Thus 

n 


(A. 3-20) 


provided 


W I""  1 1 Mini— «n  in rntimni  mm  1 1 


mu  mm  immmm  * m ■ i j . 


d°-A 

log  A.  - log  n_^n  i 0 for  1 £ i s:  n-1 


(A. 3-21) 


(n-1)  A + A > d° 
' n-1  n 


(A. 3-22) 


In  general  for  (n-k)Afi_k  + ^ A.  < d°  i (n-k+l)An_k+1+  \± 


i=k+l 


i=k+2 


■°  = I X)  [log  Xi  " log  e] 


(A. 3-23) 


where 


■°  - ,i,  >, 

n-k 


(A. 3-24) 


In  each  case,  the  estimate  £ can  be  viewed  as  the  result  of  tak- 
ing the  ( n-k)-dimensional  measurement 


z = + y 


(A. 3-25) 


where  v is  a zero  mean  Gaussian  noise  with  identity  covariance 
matrix , 


A = 


(A. 3-26) 


and  each  is  chosen  to  make  the  ith  component  of  the  covar- 
ience  of  £ given  y equal  to  e. 


« 


D 


Using  one  form  of  the  Kalman  filter  covariance  equa- 
tion (See  Ref.  17,  p.lll)  gives 


Cov  [xlx]  1 = Cov[x]_1  + AT  A 


which  implies  that 


e 1 = A.1  + 62 

l l 


(A. 3-27) 


(A. 3-28) 


6.  = 

l 


A . -e 

l 

e A- 


(A. 3-29) 


The  measurement  vector  can  also  be  written  as 


* * * 
z = H x + v 


* * 
where  v has  covariance  matrix  R and 

H = R 5 A D W 


(A. 3-20) 


(A. 3-31) 


*2  * 

(R  represents  the  symmetric  square  root  of  R . ) For  the  pur- 
pose of  determining  the  rate-distortion  function,  this  meas- 
urement completely  describes  the  optimum  encoder-channel-decoder 

■A 

combination  and  the  estimate  x is  formed  by  using  the  Kalman 
filter  equations. 


A. 4 RATE-DISTORTION  CURVE  FOR  A GAUSS-MARKOV 
PROCESS  WITH  QUADRATIC  LOSS 

Suppose  the  source  is  described  by  a linear  differen- 
tial equation 


x(t)  = F(t)  x(t)  + G(t)  w ( t ) 


(A. 4-1) 


where  w is  a white  Gaussian  noise  with  spectral  matrix  Q(t) 
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Assume  that  sufficient  information  has  been  gathered  prior  to 
time  t,  to  give  a Gaussian  conditiona  distribution  for  x(t,  ) 

1C  A 

with  covariance  P(tk)  and  mean  x(t^).  Then  xX^k+l^  can 
predicted  by 


**<tk+l ) - »Ctk+1)  X(t  ) 


The  error  in  this  prediction 


f\j^c  ^ ♦ 

5(tk+1)  “ 50k+i>  - 5 (tk+1) 


has  a Gaussian  distribution  with  covariance  matrix 


(A. 4-2) 


(A. 4-3) 


P (1W  = *(tk+l'V  P(tk)  * (tk+l’tk) 

+ Q <tk+i’tk) 


(A. 4-4) 


where 


A'l 

Q (tk+l,tk)  -/  *(tk+1,t)  G(t)  Q(t ) GT( g ) $T(tk+1,t)dt 


(A. 4-5) 


Assume  that  the  distortion  measure  is  the  same  as  in 
the  previous  section.  Note  that 


-(tk+l)  = - (tk+l^  + - (tk+l) 


(A. 4-6) 


^k+l5 


V «■¥- 

- ) = * ^k+l^ 


“ A 

- X 


(tk+l) 


5 ^k+l* 


(A. 4-7) 


Therefore  the  problem  is  one  of  estimating  the  Gaussian  vec- 

r-  ^ )(C  i 

tor  x (tk+1)  with  the  vector  [x(tk+i)  “ * That  is’ 


w*mw 


the  problem  is  the  same  as  in  the  previous  section  with 

- (tk+l)  takJnS  the  role  of  x and  x(tR+1)  " x*(tk+1)  takine 

the  role  of  x. 


Let  A i , Ag,  ....  Afi  be  the  eigenvalues  of  P*(t.+^) 
arranged  in  descending  order.  Then  the  rate-distortion  curve 
is  given  by 

n-k 

o 1 V* 

r =2  L 

i=l 


,o 


- E 


log  A.  - log 


i=k+l 


n-k 


for  (n-k)Xn_k  + Y,  \ < d°  < („-k+l)An_k+1  + £ 


i=k+l 


i=k+2 


(A. 4-8) 
A. 

l 


A. 5 RATE-DISTORTION  CURVE  FOR  A LINEAR  QUADRATIC 
GAUSSIAN  PROBLEM  WITH  SENSOR  CONSTRAINTS 

In  the  previous  section,  no  constraints  were  placed 
on  the  structure  of  the  estimating  system  which  forms  x(t,  -). 

Jtv'  -L 

However,  in  most  practical  problems,  the  estimating  system  is 
constrained  to  observe  only  measurements  of  the  form 

z(t)  = H(t)  x(t)  + v(t)  (A. 5-1) 

either  continuously  or  discretely  in  time.  Here  v(t)  is  a 
white  measurement  noise.  Results  of  Wolf  and  Ziv  (Ref.  4) 
and  Dobrushin  and  Tsybakov  (Ref.  5)  can  be  extended  to  show 
that  the  optimal  encoder  can  be  partitioned  into  a Kalman 
filter  which  preprocesses  the  measurements  followed  by  an 
encoder  which  treats  the  Kalman  filter  as  its  source  as  shown 
in  Fig.  A. 5-1  (see  Ref.  13). 
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Figure  A. 5-1  Optimal  Encoder  for  System 

with  Sensor  Constraints 


If  the  measurements  are  available  continuously,  the 
Kalman  filter  is  a continuous  time  linear  system  driven  by 
the  innovations  process  v(t), 


x± (t)  = F(t)  xx(t)  + K(t)  V(t)  (A. 5-2) 


The  innovations  process  is  a Gaussian  white  noise  process 
with  covariance  matrix  R(t),  the  same  covariance  as  the  mea- 
surement noise. 


If  the  measurements  are  available  only  at  discrete 
instants  of  time,  the  Kalman  filter  is  a discrete  time  lin- 
ear system  driven  by  an  innovations  sequence. 


^l^k+1^  ^k+l’V  -l^k^ 


k+1’ V ii'Lk 


+ K<W 


(A. 5-3) 


The  covariance  of  the  innovations  is 

N^tk+1^  = H(tk+1^  P1  (tk+l^  H (tk+l^ 


(A. 5-4) 


W*T  T • , 

Wfiwriii  I 'ii  nBumMUMi  w»n  mil  i . u 1 1 > 


where  is  the  covariance  matrix  for  the  Kalman  prefil- 


ter's error  in  predicting  x(tk+^). 


In  either  of  these  cases,  the  techniques  developed 
in  the  previous  section  can  be  used  to  compute  the  rate  dis- 
tortion function.  The  decoder  now  constructs  an  estimate 


x(tk+1)  for  the  state  of  the  Kalman  prefilter  -l(tk+l) ‘ 
Assume  that  sufficient  data  has  been  gathered  prior  to  t ^ to 
produce  a Gaussian  distribution  for  x^(tk)  with  covariance 


matrix  P(t.  ).  The  covariance  matrix  for  the  error  the  re- 
k 


ceiver  makes  in  predicting  x^(tk+^)  based  on  this  data  is 


P (t„x1)  = *<tk+1,tk)  P(tt)  * (tw1,t„) 


k+1 ’ k ' 


+ N (tjj+i.tfc) 


(A. 5-5) 


where 


N (tk+l,tk) 


/ *<tk+l't)  K<t>R^t>  RT(t)  *T<tk+l*t)  dt 


(A. 5-6) 


for  continuous  measurements  and 


^k+l’V  " Z K(t)  N(t)  RT(t)  4,T(tk+l’t)  dt 


v*  - w 


(A. 5-7) 


for  discrete  measurements.  The  eigenvalues  of  P*(tk+1)  are 
used  to  determine  the  rate  distortion  function. 


There  is  one  additional  alteration  to  the  previous 
techniques.  The  distortion  consists  of 


- j.m.  > iuiPPu.w4um«,Vj, 


E[p(x,x)]-  tr  Ejw[x(tktl)  - il(tktl)]  [x(tk+1)  - i1<tk^1)]T  WT 

* "pi'W-  s<tkn>]wT| 

- w Pi(tk+i)  wT  + w P(tk+1)  WT 

(A. 5-8) 


Therefore  the  distortion  cannot  be  made  smaller  than 


do  - W VW  *' 


(A. 5-9) 


and  the  rate-distortion  function  is 


r°  = I 
2 


n-k 

E 

i=l 


n 


log  X±  - log 


d°-d  - £ X. 

° i=k+l  1 
n-k 


(A. 5-10) 


for 

(n-k)Xn_k  + E xi  + V d°  * (n-k+1>Xn-k+l  + £ xi  + d0 
i-k+l  l-k+2 

where  X^,  Xg,  •••,  Xfi  are  the  eigenvalues  of  W P*(tfc+j)  WT 
arranged  in  descending  order  of  magnitude. 
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APPENDIX  B 


DERIVATION  OF  THE  ERROR  COVARIANCE  EQUATIONS 


The  equations  in  Chapter  4 can  be  used  to  produce  a 
recursion  relation  for  the  one-step  prediction  error.  First 
note  that 


'b 

x.-.i  = x1,-C1.,m1, 

— n+1  n — n+1  n+1  — n+1 


= + w - C . - A m 

n — n — n n+1  n -n 


- C - B(H  x + v - G m) 
n+1  nv  n — n — n n — ny 


( $ - C , - B H ) x 

n n+1  n n'  — n 


C . - ( A -B  G ) m 
n+lv  n n n'  — n 


- w - C ...  B v 
— n n+1  n - -n 


(B-l) 


Recognizing  that 


<b 

x„  i 1 + C m 
-n  n-1  n — n 


(B-2  ) 


gives 


-n+1  n 


($>  " C B H ) x | 1 

n n+1  n n — n n-1 


|"(*  - C ...  B H )C  - C . - (A  -B  G )]  m 

L n n+1  n n'  n n+lv  n n n'J  — n 


+ w„  - C . - B v 
-n  n+1  n — n 


——pi 


(B-4) 


Therefore,  ^.n+]_|n  and  satisfy  the  coupled  set  of  linear 

vector  equations 


— n+1 | n 


An(n)  A12(n) 

A2i(n)  A22(i»)  -n 


A1(n) 

A2(n) 


where 


and 


(B-5  ) 

u(n) 

= 4> 

n 

- C , 1 B H 
n+1  n n 

(B-6  ) 

12<n) 

= ($ 

n - Cn+1  Bn  V C„ 

(B-7  ) 

— 

<WAn  - B„  Gn> 

2i(n) 

= B 

n 

H 

n 

(B-8  ) 

22t'n'1 

= A 

n 

+ B (H  C - G ) 
nv  n n n 7 

(B-9  ) 

A1(n) 

= 

C ...  B 
n+1  n 

(B-10 ) 

2<n) 

= B 

n 

(B-ll ) 

Define 


1 


Si 


•1 

J 

a* 


0/ 

— n I n-1  — n|  n-1 


m x | - 

— n — n n-1 


% T 

xi  - m 
— n n-1  -n 


T 

m m 
— n — n 


and  the  previous  equation  gives 


n - = a n at+a  r at+  Qn  0 

n+l  n n n n n n 

0 0 


(B-12 ) 


(B-13 ) 


This  recursion  relation  is  started  with 


where 


P*  + (I-C  T ) X(l-C  T )T  (I-C  T ) xrt  T* 
o o o o o o o o o o 


Trt  Xrt(I-C  T ) 
o o o o' 


T 

T X„  T* 
o o o 


<o  - E[So]  e[2o]T 

Jo  = E[<2  - e[So])(5  - E[xc])T] 

n is  chosen  to  be  T ETx  ] 

O O 1 0-1 


and 

C 

E fx 

Ti 

m 

E Tm 

mTl 
m 1 

o 

L-o 

— o 

L-o 

— OJ 

= X [T  X TT"f 
o o L 0 o oj 

The  state  estimation  error  is  defined  to  be 


a.  ~ 

X I = X - X | 

-n  n -n  -n  n 


which  can  be  expanded  to 


( B-14 ) 

(B-15) 

(B-16) 


( B-17 ) 


( B-18 ) 


^ I „ — x - C m - K (H  x + v - G in') 

n I n -n  n -n  nK  n -n  -n  n -n; 

= (I  - K H ) x -(C  - K G ) m - K v 

n n — n n n n — n n — n 


(B-19 ) 


Recalling  that  xn  = x^^  + Cn  mn  gives 

x I = ( I-K  H ) x . , + 

-n|n  n n'  -n I n-1 


[(I-Kn  H„>  Cn  ‘ <C„  ' K„  G»>K  - K„ 

= Hn)  *n|n-l  ‘ Kn(Gn  ” Hn  Cn}  % “ Kn 


YjCn)  ^2(n)] 


'Xj 

x i i 

— n n-1 


m 
— n 


- K v 
n — n 


(B-20) 


Then  the  estimation  error  covariance  is 


P = eTx  I x T 1 

n L~n  | n — n | n J 

= f H J + K 

R KT 

n n n n 

n n 

(B-21 ) 
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